FY7 polymerase

ABSTRACT

A purified recombinant thermostable DNA polymerase polymerase which exhibits at least about 80% activity at salt concentations of 50 mM and greater, at least about 70% activity at salt concentrations of 25 mM and greater, and having a processivity of about 30 nucleotides per binding event. An isolated nucleic acid that encodes the thermostable DNA polymerase, as well as a recombinant DNA vector comprising the nucleic acid and a recombinant host cell transformed with the vector, are also disclosed. A method of sequencing DNA using the DNA polymerase as well as a kit for sequencing DNA is also disclosed.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims priority to United States Provisional Application Serial No. 60/089,556, filed on Jun. 17, 1998, the entire disclosure of which is incorporated in its herein.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The instant disclosure pertains to thermostable DNA polymerases which exhibit improved robustness and efficiency.

2. Background

DNA polymerases are enzymes which are useful in many recombinant DNA techniques such as nucleic acid amplification by the polymerase chain reaction (“PCR”), self-sustained sequence replication (“3SR”), and high temperature DNA sequencing. Thermostable polymerases are particularly useful. Because heat does not destroy the polymerase activity, there is no need to add additional polymerase after every denaturation step.

However, many thermostable polymerases have been found to display a 5′ to 3′ exonuclease or structure-dependent single-stranded endonuclease (“SDSSE”) activity which may limit the amount of product produced or contribute to the plateau phenomenon in the normally exponential accumulation of product. Such 5′ to 3′ nuclease activity may contribute to an impaired ability to efficiently generate long PCR products greater than or equal to 10 kb, particularly for G+C rich targets. In DNA sequencing applications and cycle sequencing applications, the presence of 5′ to 3′ nuclease activity may contribute to a reduction in desired band intensities and/or generation of spurious or background bands.

Additionally, many of the enzymes presently available are sensitive to high salt environments and have low processing ability, that is, the number of nucleotides incorporated per DNA polymerase binding event. Furthermore, addition of dITP to the reaction mixture to address compression problems usually results in reduced activity of the enzyme.

Thus, a need continues to exist for an improved DNA polymerase having increased tolerance to high salt conditions, efficient utilization of dITP, high productivity, and improved performance on GC-rich templates.

BRIEF SUMMARY OF THE INVENTION

The instant disclosure teaches a purified recombinant thermostable DNA polymerase comprising the amino acid sequence set forth in FIG. 1, as well as a purified recombinant thermostable DNA polymerase which exhibits at least about 80% activity at salt concentrations of 50 mM and greater. The instant disclosure further teaches a purified recombinant thermostable DNA polymerase which exhibits at least about 70% activity at salt concentrations of 25 mM and greater, and a purified recombinant thermostable DNA polymerase having a processivity of about 30 nucleotides per binding event.

The instant disclosure also teaches an isolated nucleic acid that encodes a thermostable DNA polymerase, wherein said nucleic acid consists of the nucleotide sequence set forth in FIG. 1, as well as a recombinant DNA vector that comprises the nucleic acid, and a recombinant host cell transformed with the vector.

The instant disclosure also teaches a method of sequencing DNA comprising the step of generating chain terminated fragments from the DNA template to be sequenced with the DNA polymerase in the presence of at least one chain terminating agent and one or more nucleotide triphosphates, and determining the sequence of said DNA from the sizes of said fragments. The instant disclosure also teaches a kit for sequencing DNA comprising the DNA polymerase.

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWINGS

FIG. 1 depicts the amino acid sequence (SEQ ID No. 2) (and DNA sequence encoding therefor (SEQ ID No. 1)) for the FY7 polymerase.

FIG. 2 depicts the DNA sequence (SEQ ID No. 3) of M13mp18 DNA sequenced using the FY7 polymerase formulated in Mn conditions, as shown by a print out from an ABI model 377 automated fluorescent DNA sequencing apparatus.

FIG. 3 depicts the DNA sequence (SEQ ID No. 4) of M13mp18 DNA sequenced using the FY7 polymerase formulated in Mg conditions, as shown by a print out from an ABI model 377 automated fluorescent DNA sequencing apparatus.

FIG. 4 depicts the percent of maximum polymerase activity for Thermo Sequenase™ enzyme DNA polymerase versus FY7 DNA polymerase under varying KCl concentrations.

FIG. 5 depicts the effect of high salt concentrations on DNA sequencing ability in radioactively labeled DNA sequencing reactions using Thermo Sequenase™ enzyme DNA polymerase versus FY7 DNA polymerase.

FIGS. 6-10 (SEQ ID Nos. 5-9, respectively) depict the effect of increasing salt concentration on the performance of Thermo Sequenase. At concentrations as low as 25 mM data quality is affected with the read length being decreased from at least 600 bases to about 450 bases. At 50 mM salt the read length is further decreased to about 350 bases, 75 mM to about 250 bases and at 100 mM the read length is negligible.

FIGS. 11-15 (SEQ ID Nos. 10-14, respectively) depict the effect of increasing salt concentration on the performance of FY7 DNA polymerase. There is no detrimental effect on performance to at least 75 mM KCl and only a slight decrease in data quality at 100 mM KCl.

FIG. 16 depicts the processivity measured for Thermo Sequenase DNA polymerase, AmpliTaq FS DNA polymerase, compared with the processivity measured for FY7 DNA polymerase.

FIG. 17 depicts the improved read length obtained when using FY7 polymerase versus Thermo Sequenase DNA polymerase in radioactively labeled sequencing reactions incorporating the dGTP (Guanosine triphosphate) analog dITP (Inosine triphosphate) at 72° C.

FIGS. 18-22 (SEQ ID Nos. 15-19, respectively) show the effect of increasing extension step time on the read length and data quality produced by Thermo Sequenase DNA polymerase in fluorescently labeled terminator DNA sequencing reactions

FIGS. 23-27 (SEQ ID Nos. 20-24, respectively) show the effect of increasing extension step time on the read length and data quality produced by FY7 DNA polymerase in fluorescently labeled terminator DNA sequencing reactions.

DETAILED DESCRIPTION OF THE INVENTION

A series of polymerase mutants were constructed with the aim of obtaining an improved polymerase for DNA sequencing, by reducing the exonuclease activity found in full length Thermus thermophilus and Thermus aquaticus DNA polymerase I enzymes. Six conserved motifs (Gutman and Minton (1993) Nucleic Acids Research 21, 4406-4407) can be identified in the amino-terminal domain of pol I type polymerases, in which the 5′ to 3′ exonuclease activity has been shown to reside. Further, six carboxylate residues in these conserved regions have been shown in a crystal structure to be located at the active site of the exonuclease domain of Thermus aquaticus DNA pol I (Kim et al., (1995) Nature 376, 612-616). Point mutations were made by site-directed mutagenesis to carboxylates and other residues in three of six conserved motifs in Tth and Taq polymerases as follows: Taq D18A, Taq T140V, Taq D142N/D144N. All of these have the mutation F667Y outside of the exonuclease domain. Tth D18A, Tth T141V, Tth D143N/D145N. All of these have the mutation F669Y outside of the exonuclease domain.

All polymerases were evaluated for exonuclease activity, processivity, strand displacement, salt tolerance, thermostability, and sequencing quality. One FY7 polymerase, Tth D18A, F669Y, is described in further detail below.

EXAMPLES Methods

In Vitro Mutagenesis

PCR was employed to introduce an aspartic acid to alanine amino acid change at codon 18 (D18A) of cloned full length F669Y Tth (plasmid pMR10). Mutagenic Primer 1 (CTGTTCGAACCCAAAGGCCGTGTCCTCCTGGTGGCCGGCCACCAC) (SEQ ID No. 25) spans nucleotides 19-60 of pMR10 including codon 18 and a BstBI restriction site. Oligonucleotide Primer 2 (GAGGCTGCCGAATTCCAGCCTCTC) (SEQ ID No. 26) spans an EcoRI site of pMR10. pMR10 was used as template DNA. The PCR product was digested with BstBI and EcoRI and ligated to two fragments of pMR10: a 5000 bp KpnI/BstBI and a 2057 bp EcoRI/KpnI, creating plasmid pMR12. Cells of E. coli strain DH1λ⁺ were used for primary transformation, and strain M5248 (λ cI857) was used for protein expression, although any comparable pair of E. coli strains carrying the cI⁺ and cI857 alleles could be utilized. Alternatively, any rec⁺ cI⁺ strain could be induced by chemical agents such as nalidixic acid to produce the polymerase.

Purification of Polymerase

M5248 containing plasmid pMR12 was grown in one liter of LB medium (1% tryptone, 0.5% yeast extract, 1% NaCl), preferably 2× LB medium, containing 100 mg/ml ampicillin at 30° C. When the OD₆₀₀ reached 1.0, the culture was induced at 42° C. for 1.5 hours. The cultures were then cooled to <20° C. and the cells harvested by centrifugation in a Sorvall RC-3B centrifuge at 5000 rpm at 4° C. for 15 to 30 minutes. Harvested cells were stored at −80° C.

The cell pellet was resuspended in 25 ml pre-warmed lysis buffer (50 mM Tris-HCl pH 8.0, 10 mM MgCl₂, 16 mM (NH₄)₂SO₄, 1 mM EDTA, 0.1%, preferably 0.2% Tween 20, 0.1%, preferably 0.2% NP40). Preferably, the lysis buffer contains 300 mM NaCl. Resuspended cells were incubated at 75-85° C. for 10-20 minutes, sonicated for 1 minute, and cleared by centrifugation. The cleared lysate was passed through a 300 ml column of diethylaminoethyl cellulose (Whatman DE 52) equilibrated in buffer A (50 mM Tris-HCl pH 8.0, 1 mM EDTA, 0.1% Tween 20, 0.1% NP40) containing 100 mM, preferably 300 mM NaCl. Fractions were assayed for polymerase activity, and those demonstrating peak polymerase activity were pooled, diluted to 50 mM NaCl with Buffer A, and loaded onto a heparin sepharose column (20 ml) equilibrated with 50 mM NaCl in buffer A. The polymerase was eluted from the column with a linear salt gradient from 50 mM to 700 mM NaCl in buffer A. Fractions were assayed for polymerase activity, and those demonstrating peak activity were pooled and dialyzed against final buffer (20 mM Tris-HCl pH8.5, 50% (v/v) glycerol, 0.1 mM EDTA, 0.5% Tween 20, 0.5% NP40, 1 mM DTT, 100 mM KCl). The purified protein is designated FY7. The amino acid sequence (and DNA sequence encoding therefor) are presented in FIG. 1. ps Bacterial Strains

E. coli strains: DHIλ⁺ [gyrA96, recA1, relA1, endA1, thi-1, hsdR17, supE44, λ⁺]; M5248 [λ(bio275, cI857, cIII+, N+, λ(H1))].

PCR

Plasmid DNA from E. coli DHIλ⁺ (pMR10) was prepared by SDS alkaline lysis method (Sambrook et al., Molecular Cloning 2^(nd) Ed. Cold Spring Harbor Press, 1989). Reaction conditions were as follows: 10 mM Tris-HCl pH 8.3, 50 mM KCl, 1.5 mM MgCl₂, 0.001% gelatin, 1 uM each primer, 2.5U Taq polymerase, per 100 μl reaction. Cycling conditions were 94° C. 2 minutes, then 35 cycles of 94° C. 30s, 55° C. 30s, 72° C. 3 minutes followed by 72° C. for 7 minutes.

Example 1 Formulation of the Enzyme in Mn Conditions

In the following “pre-mix” protocol, all the reagents are contained in two solutions; reagent mix A and reagent mix B.

Reagent Mix A

The following reagents were combined to make 10 ml of reagent mix A:

2.5 ml 1 M HEPPS N-(2-hydroxyethyl) piperazine-N′-(3-propanesulfonic acid), pH 8.0

500 μl 1 M tartaric acid, pH 8.0

50,000 units FY7 DNA polymerase

1 unit Thermoplasma acidophilum inorganic pyrophosphatase

100 μl 100 mM dATP

100 μl 100 mM dTTP

100 μl 100 mM dCTP

500 μl 100 mM dITP

9.375 μl 100 μM C-7-propargylamino-4-rhodamine-6-G-ddATP

90 μl 100 μM C-5-propargylamino-4-rhodamine-X-ddCTP

6.75 μl 100 μM C-7-propargylamino-4-rhodamine-110-ddGTP

165 μl 100 μM C-5-propargylamino-4-tetramethylrhodamine-ddUTP

10 μl 50 mM EDTA

1 ml glycerol

The volume was made up to 10,000 μl with deionized H₂O.

Reagent Mix B

The following reagents were combined to make 10 ml of reagent mix B:

10 μl 1M MES 2-(N-morpholino)ethanesulfonic acid, pH 6.0

200 μl 1M MgCl₂

75 μl 1M MnSO₄

The volume was made up to 10,000 μl with deionized H₂O.

Example 2 Use of the Formulation From Example 1

Two (2) μl reagent mix A, 2 μl reagent mix B, 200 ng M13mp18 DNA, 5 pmole of primer (M13-40 Forward 5′-GTTTTCCCAGTCACGACGTTGTA) (SEQ ID No. 27), and deionized water to a total volume of 20 μl were mixed together and subjected to 25 cycles of (95° C. 30 seconds, 60° C. 1 minute) in a thermal cycler. After cycling, 4 μl of a solution which contained 1.5 M sodium acetate, 250 mM EDTA was added. The solution was mixed and 4 volumes (100 μl) of ethanol added. The DNA was precipitated by incubation on ice for 15-20 minutes followed by centrifugation. The supernatant was removed and the pellet was washed with 70% ethanol, dried and resuspended in 4 μl of formamide containing loading dye. The resuspended DNA was then run on an automated fluorescent DNA sequencing apparatus (ABI model 377 instrument). The print out from the machine of the DNA sequence is shown as FIG. 2.

Example 3 Formulation of the Enzyme in Mg Conditions

In the following “pre-mix” protocol, all the reagents are contained in one solution.

Sequencing Premix

The following reagents were combined to make 800 μl of Sequencing premix 200 μl of 500 mM Tris-HCl pH 9.5, 20 mM MgCl₂

100 μl 40 units/μl FY7 DNA polymerase, 0.0008 units/μl Thermoplasma acidophilum inorganic pyrophosphatase

100 μl 10 mM dITP, 2 mM dATP, 2 mM dTTP, 2 mM dCTP

100 μl 0.125 μM C-7-propargylamino-4-rhodamine-6-G-ddATP

100 μl 1.2 μM C-5-propargylamino-4-rhodamine-X-ddCTP

100 μl 0.09 μM C-7-propargylamino-4-rhodamine-110-ddGTP

100 μl 2.2 μM C-5-propargylamino-4-tetramethylrhodamine-ddUTP

Example 4 Use of the Formulation From Example 3

Four (4) μl of sequencing premix, 200 ng M13mp18 DNA, 5 pmole of primer (M13-40 Forward 5′-GTTTTCCCAGTCACGACGTTGTA) (SEQ ID No. 27), and deionized water to a total volume of 20 μl were mixed together and subjected to 25 cycles of (95° C. 30 seconds, 60° C. 2 minutes) in a thermal cycler. After cycling, 7 μl of 7.5 M ammonium acetate was added. The solution was mixed and 4 volumes (100 μl) of ethanol added. The DNA was precipitated by incubation on ice for 15-20 minutes followed by centrifugation. The supernatant was removed and the pellet was washed with 70% ethanol, dried and resuspended in 4 μl of formamide containing loading dye. The resuspended DNA was then run on an automated fluorescent DNA sequencing apparatus (ABI model 377 instrument). The print out from the machine of the DNA sequence is shown as FIG. 3.

Example 5 Polymerase Activity Versus Salt Concentration (KCl) for Thermo Sequenase™ Enzyme and FY7 Enzyme

The percent of maximum polymerase activity was measured for Thermo Sequenase™ enzyme DNA polymerase and FY7 DNA polymerase under varying KCl concentrations. The results are depicted in FIG. 4. The data indicate that FY7 has a much higher salt optimum as well as broader range of tolerance for salt in the reaction mixture than Thermo Sequenase™. The salt concentration which gives 50% activity is five-fold higher for FY7 than for Thermo Sequenase.

The effect of high salt concentrations on DNA sequencing ability in radioactively labeled DNA sequencing reactions was also examined. The results are presented in FIG. 5. At KCl concentrations of 50 mM or higher Thermo Sequenase™ polymerase performance degrades to levels at which usable data cannot be extracted. FY7 DNA polymerase, however, is able to give quite good sequencing data at concentrations of KCl of 100 mM.

Example 6 Fluorescent Sequencing Salt Tolerance

These experiments examined the effect of the above-demonstrated polymerase activity in high salt concentrations on DNA sequencing ability in fluorescently labeled terminator DNA sequencing reactions. The results are presented in FIGS. 6-15.

FIGS. 6-10 show the effect of increasing salt concentration on the performance of Thermo Sequenase. At concentrations as low as 25 mM data quality is affected with the read length being decreased from at least 600 bases to about 450 bases. At 50 mM salt the read length is further decreased to about 350 bases, 75 mM to about 250 bases and at 100 mM the read length is negligible.

FIGS. 11-15 show the effect of increasing salt concentration on the performance of FY7 DNA polymerase. There is no detrimental effect on performance to at least 75 mM KCl and only a slight decrease in data quality at 100 mM KCl.

As it is recognized that some types of DNA preparations may be contaminated with salt (which is detrimental to DNA sequencing data quality), the use of FY7 DNA polymerase allows for a more robust sequencing reaction over a broader range of template conditions.

Example 7 Polymerase Processivity

The processivity (number of nucleotides incorporated per DNA polymerase binding event) has been measured, for different DNA sequencing polymerases. The results are presented in FIG. 16. Thermo Sequenase DNA polymerase has a processivity of only ˜4 nucleotides per binding event. AmpliTaq FS DNA polymerase has a processivity of ˜15 nucleotides per binding event. FY7 DNA polymerase has a processivity more than seven-fold greater than Thermo Sequenase DNA polymerase and ˜two-fold greater than AmpliTaq FS DNA polymerase at ˜30 nucleotides per binding event.

Example 8 Polymerase Extension with dITP at 72° C.

The series examined improved read length obtained when using FY7 polymerase versus Thermo Sequenase DNA polymerase in radioactively labeled sequencing reactions incorporating the dGTP (Guanosine triphosphate) analog dITP (Inosine triphosphate) at 72° C. The results are presented in FIG. 17. FY7 is able to incorporate >50-100 more nucleotides under standard ³³P[α-dATP] sequencing conditions than Thermo Sequenase.

Example 9 Effect of Extension Step Time on Length of Read

These series of experiments examined the effect of increasing extension step time of the read length and data quality of Thermo Sequenase and FY7 DNA polymerases in fluorescently labeled terminator DNA sequencing reactions. The results are presented in FIGS. 18-27.

FIGS. 18-22 show the effect of increasing extension step time on the read length and data quality produced by Thermo Sequenase DNA polymerase. This data shows that a minimum of a two minutes extension step is required by Thermo Sequenase in order to achieve a quality read of at least 600 bases. Signal strength generally increases to a maximum at a four minute extension (the time specified in the commercial product utilizing this enzyme and method).

FIGS. 23-27 show the effect of increasing extension step time on the read length and data quality produced by FY7 DNA polymerase. This data shows that a minimum of a 30 second extension step is required by FY7 in order to achieve a quality read of at least 600 bases. Signal strengths plateau at about one minute extension time. The FY7 DNA polymerase can produce data of equivalent quality to Thermo Sequenase in one-quarter to one-half the time of extension reaction.

Although the above examples describe various embodiments of the invention in detail, many variations will be apparent to those of ordinary skill in the art. Accordingly, the above examples are intended for illustration purposes and should not be used in any way to restrict the scope of the appended claims.

27 1 2505 DNA Thermus thermophilus CDS (1)..(2502) 1 atg gaa gcg atg ctg ccg ctg ttc gaa ccc aaa ggc cgt gtc ctc ctg 48 Met Glu Ala Met Leu Pro Leu Phe Glu Pro Lys Gly Arg Val Leu Leu 1 5 10 15 gtg gcc ggc cac cac ctg gcc tac cgc acc ttc ttc gcc ctg aag ggc 96 Val Ala Gly His His Leu Ala Tyr Arg Thr Phe Phe Ala Leu Lys Gly 20 25 30 ctc acc acg agc cgg ggc gaa ccg gtg cag gcg gtc tac ggc ttc gcc 144 Leu Thr Thr Ser Arg Gly Glu Pro Val Gln Ala Val Tyr Gly Phe Ala 35 40 45 aag agc ctc ctc aag gcc ctg aag gag gac ggg tac aag gcc gtc ttc 192 Lys Ser Leu Leu Lys Ala Leu Lys Glu Asp Gly Tyr Lys Ala Val Phe 50 55 60 gtg gtc ttt gac gcc aag gcc ccc tcc ttc cgc cac gag gcc tac gag 240 Val Val Phe Asp Ala Lys Ala Pro Ser Phe Arg His Glu Ala Tyr Glu 65 70 75 80 gcc tac aag gcg ggg agg gcc ccg acc ccc gag gac ttc ccc cgg cag 288 Ala Tyr Lys Ala Gly Arg Ala Pro Thr Pro Glu Asp Phe Pro Arg Gln 85 90 95 ctc gcc ctc atc aag gag ctg gtg gac ctc ctg ggg ttt acc cgc ctc 336 Leu Ala Leu Ile Lys Glu Leu Val Asp Leu Leu Gly Phe Thr Arg Leu 100 105 110 gag gtc ccc ggc tac gag gcg gac gac gtt ctc gcc acc ctg gcc aag 384 Glu Val Pro Gly Tyr Glu Ala Asp Asp Val Leu Ala Thr Leu Ala Lys 115 120 125 aag gcg gaa aag gag ggg tac gag gtg cgc atc ctc acc gcc gac cgc 432 Lys Ala Glu Lys Glu Gly Tyr Glu Val Arg Ile Leu Thr Ala Asp Arg 130 135 140 gac ctc tac caa ctc gtc tcc gac cgc gtc gcc gtc ctc cac ccc gag 480 Asp Leu Tyr Gln Leu Val Ser Asp Arg Val Ala Val Leu His Pro Glu 145 150 155 160 acc gcc gac cgc gac ctc tac caa ctc gtc tcc gac cgc gtc gcc gtc 528 Thr Ala Asp Arg Asp Leu Tyr Gln Leu Val Ser Asp Arg Val Ala Val 165 170 175 ctc cac ccc gag ggc cac ctc atc acc ccg gag tgg ctt tgg gag aag 576 Leu His Pro Glu Gly His Leu Ile Thr Pro Glu Trp Leu Trp Glu Lys 180 185 190 tac ggc ctc agg ccg gag cag tgg gtg gac ttc cgc gcc ctc gtg ggg 624 Tyr Gly Leu Arg Pro Glu Gln Trp Val Asp Phe Arg Ala Leu Val Gly 195 200 205 gac ccc tcc gac aac ctc ccc ggg gtc aag ggc atc ggg gag aag acc 672 Asp Pro Ser Asp Asn Leu Pro Gly Val Lys Gly Ile Gly Glu Lys Thr 210 215 220 gcc ctc aag ctc ctc aag gag tgg gga agc ctg gaa aac ctc ctc aag 720 Ala Leu Lys Leu Leu Lys Glu Trp Gly Ser Leu Glu Asn Leu Leu Lys 225 230 235 240 ctc agg ctc tcc ttg gag ctc tcc cgg gtg cgc acc gac ctc ccc ctg 768 Leu Arg Leu Ser Leu Glu Leu Ser Arg Val Arg Thr Asp Leu Pro Leu 245 250 255 gag gtg gac ctc gcc cag ggg cgg gag ccc gac cgg gag ggg ctt agg 816 Glu Val Asp Leu Ala Gln Gly Arg Glu Pro Asp Arg Glu Gly Leu Arg 260 265 270 gcc ttc ctg gag agg ctg gaa ttc ggc agc ctc ctc cac gag ttc ggc 864 Ala Phe Leu Glu Arg Leu Glu Phe Gly Ser Leu Leu His Glu Phe Gly 275 280 285 ctc ctg gag gcc ccc gcc ccc ctg gag gag gcc ccc tgg ccc ccg ccg 912 Leu Leu Glu Ala Pro Ala Pro Leu Glu Glu Ala Pro Trp Pro Pro Pro 290 295 300 gaa ggg gcc ttc gtg ggc ttc gtc ctc tcc cgc ccc gag ccc atg tgg 960 Glu Gly Ala Phe Val Gly Phe Val Leu Ser Arg Pro Glu Pro Met Trp 305 310 315 320 gcg gag ctt aaa gcc ctg gcc gcc tgc agg gac ggc cgg gtg cac cgg 1008 Ala Glu Leu Lys Ala Leu Ala Ala Cys Arg Asp Gly Arg Val His Arg 325 330 335 gca gca gac ccc ttg gcg ggg cta aag gac ctc aag gag gtc cgg ggc 1056 Ala Ala Asp Pro Leu Ala Gly Leu Lys Asp Leu Lys Glu Val Arg Gly 340 345 350 ctc ctc gcc aag gac ctc gcc gtc ttg gcc tcg agg gag ggg cta gac 1104 Leu Leu Ala Lys Asp Leu Ala Val Leu Ala Ser Arg Glu Gly Leu Asp 355 360 365 ctc gtg ccc ggg gac gac ccc atg ctc ctc gcc tac ctc ctg gac ccc 1152 Leu Val Pro Gly Asp Asp Pro Met Leu Leu Ala Tyr Leu Leu Asp Pro 370 375 380 tcc aac acc acc ccc gag ggg gtg gcg cgg cgc tac ggg ggg gag tgg 1200 Ser Asn Thr Thr Pro Glu Gly Val Ala Arg Arg Tyr Gly Gly Glu Trp 385 390 395 400 acg gag gac gcc gcc cac cgg gcc ctc ctc tcg gag agg ctc cat cgg 1248 Thr Glu Asp Ala Ala His Arg Ala Leu Leu Ser Glu Arg Leu His Arg 405 410 415 aac ctc ctt aag cgc ctc gag ggg gag gag aag ctc ctt tgg ctc tac 1296 Asn Leu Leu Lys Arg Leu Glu Gly Glu Glu Lys Leu Leu Trp Leu Tyr 420 425 430 cac gag gtg gaa aag ccc ctc tcc cgg gtc ctg gcc cac atg gag gcc 1344 His Glu Val Glu Lys Pro Leu Ser Arg Val Leu Ala His Met Glu Ala 435 440 445 acc ggg gta cgg ctg gac gtg gcc tac ctt cag gcc ctt tcc ctg gag 1392 Thr Gly Val Arg Leu Asp Val Ala Tyr Leu Gln Ala Leu Ser Leu Glu 450 455 460 ctt gcg gag gag atc cgc cgc ctc gag gag gag gtc ttc cgc ttg gcg 1440 Leu Ala Glu Glu Ile Arg Arg Leu Glu Glu Glu Val Phe Arg Leu Ala 465 470 475 480 ggc cac ccc ttc aac ctc aac tcc cgg gac cag ctg gaa agg gtg ctc 1488 Gly His Pro Phe Asn Leu Asn Ser Arg Asp Gln Leu Glu Arg Val Leu 485 490 495 ttt gac gag ctt agg ctt ccc gcc ttg ggg aag acg caa aag aca ggc 1536 Phe Asp Glu Leu Arg Leu Pro Ala Leu Gly Lys Thr Gln Lys Thr Gly 500 505 510 aag cgc tcc acc agc gcc gcg gtg ctg gag gcc cta cgg gag gcc cac 1584 Lys Arg Ser Thr Ser Ala Ala Val Leu Glu Ala Leu Arg Glu Ala His 515 520 525 ccc atc gtg gag aag atc ctc cag cac cgg gag ctc acc aag ctc aag 1632 Pro Ile Val Glu Lys Ile Leu Gln His Arg Glu Leu Thr Lys Leu Lys 530 535 540 aac acc tac gtg gac ccc ctc cca agc ctc gtc cac ccg agg acg ggc 1680 Asn Thr Tyr Val Asp Pro Leu Pro Ser Leu Val His Pro Arg Thr Gly 545 550 555 560 cgc ctc cac acc cgc ttc aac cag acg gcc acg gcc acg ggg agg ctt 1728 Arg Leu His Thr Arg Phe Asn Gln Thr Ala Thr Ala Thr Gly Arg Leu 565 570 575 agt agc tcc gac ccc aac ctg cag aac atc ccc gtc cgc acc ccc ttg 1776 Ser Ser Ser Asp Pro Asn Leu Gln Asn Ile Pro Val Arg Thr Pro Leu 580 585 590 ggc cag agg atc cgc cgg gcc ttc gtg gcc gag gcg ggt tgg gcg ttg 1824 Gly Gln Arg Ile Arg Arg Ala Phe Val Ala Glu Ala Gly Trp Ala Leu 595 600 605 gtg gcc ctg gac tat agc cag ata gag ctc cgc gtc ctc gcc cac ctc 1872 Val Ala Leu Asp Tyr Ser Gln Ile Glu Leu Arg Val Leu Ala His Leu 610 615 620 tcc ggg gac gaa aac ctg atc agg gtc ttc cag gag ggg aag gac atc 1920 Ser Gly Asp Glu Asn Leu Ile Arg Val Phe Gln Glu Gly Lys Asp Ile 625 630 635 640 cac acc cag acc gca agc tgg atg ttc ggc gtc ccc ccg gag gcc gtg 1968 His Thr Gln Thr Ala Ser Trp Met Phe Gly Val Pro Pro Glu Ala Val 645 650 655 gac ccc ctg atg cgc cgg gcg gcc aag acg gtg aac tac ggc gtc ctc 2016 Asp Pro Leu Met Arg Arg Ala Ala Lys Thr Val Asn Tyr Gly Val Leu 660 665 670 tac ggc atg tcc gcc cat agg ctc tcc cag gag cta gcc atc ccc tac 2064 Tyr Gly Met Ser Ala His Arg Leu Ser Gln Glu Leu Ala Ile Pro Tyr 675 680 685 gaa gaa gcg gtg gcc ttt ata gag cgc tac ttc caa agc ttc ccc aag 2112 Glu Glu Ala Val Ala Phe Ile Glu Arg Tyr Phe Gln Ser Phe Pro Lys 690 695 700 gtg cgg gcc tgg ata gaa aag acc ctg gag gag ggg agg aag cgg ggc 2160 Val Arg Ala Trp Ile Glu Lys Thr Leu Glu Glu Gly Arg Lys Arg Gly 705 710 715 720 tac gtg gaa acc ctc ttc gga aga agg cgc tac gtg ccc gac ctc aac 2208 Tyr Val Glu Thr Leu Phe Gly Arg Arg Arg Tyr Val Pro Asp Leu Asn 725 730 735 gcc cgg gtg aag agc gtc agg gag gcc gcg gag cgc atg gcc ttc aac 2256 Ala Arg Val Lys Ser Val Arg Glu Ala Ala Glu Arg Met Ala Phe Asn 740 745 750 atg ccc gtc cag ggc acc gcc gcc gac ctc atg aag ctc gcc atg gtg 2304 Met Pro Val Gln Gly Thr Ala Ala Asp Leu Met Lys Leu Ala Met Val 755 760 765 aag ctc ttc ccc cgc ctc cgg gag atg ggg gcc cgc atg ctc ctc cag 2352 Lys Leu Phe Pro Arg Leu Arg Glu Met Gly Ala Arg Met Leu Leu Gln 770 775 780 gtc cac gac gag ctc ctc ctg gag gcc ccc caa gcg cgg gcc gag gag 2400 Val His Asp Glu Leu Leu Leu Glu Ala Pro Gln Ala Arg Ala Glu Glu 785 790 795 800 gtg gcg gct ttg gcc aac gag gcc atg gag aag gcc tat ccc ctc gcc 2448 Val Ala Ala Leu Ala Asn Glu Ala Met Glu Lys Ala Tyr Pro Leu Ala 805 810 815 gtg ccc ctg gag gtg gag gtg ggg atg ggg gag gac tgg ctt tcc gcc 2496 Val Pro Leu Glu Val Glu Val Gly Met Gly Glu Asp Trp Leu Ser Ala 820 825 830 aag ggt tag 2505 Lys Gly 2 834 PRT Thermus thermophilus 2 Met Glu Ala Met Leu Pro Leu Phe Glu Pro Lys Gly Arg Val Leu Leu 1 5 10 15 Val Ala Gly His His Leu Ala Tyr Arg Thr Phe Phe Ala Leu Lys Gly 20 25 30 Leu Thr Thr Ser Arg Gly Glu Pro Val Gln Ala Val Tyr Gly Phe Ala 35 40 45 Lys Ser Leu Leu Lys Ala Leu Lys Glu Asp Gly Tyr Lys Ala Val Phe 50 55 60 Val Val Phe Asp Ala Lys Ala Pro Ser Phe Arg His Glu Ala Tyr Glu 65 70 75 80 Ala Tyr Lys Ala Gly Arg Ala Pro Thr Pro Glu Asp Phe Pro Arg Gln 85 90 95 Leu Ala Leu Ile Lys Glu Leu Val Asp Leu Leu Gly Phe Thr Arg Leu 100 105 110 Glu Val Pro Gly Tyr Glu Ala Asp Asp Val Leu Ala Thr Leu Ala Lys 115 120 125 Lys Ala Glu Lys Glu Gly Tyr Glu Val Arg Ile Leu Thr Ala Asp Arg 130 135 140 Asp Leu Tyr Gln Leu Val Ser Asp Arg Val Ala Val Leu His Pro Glu 145 150 155 160 Thr Ala Asp Arg Asp Leu Tyr Gln Leu Val Ser Asp Arg Val Ala Val 165 170 175 Leu His Pro Glu Gly His Leu Ile Thr Pro Glu Trp Leu Trp Glu Lys 180 185 190 Tyr Gly Leu Arg Pro Glu Gln Trp Val Asp Phe Arg Ala Leu Val Gly 195 200 205 Asp Pro Ser Asp Asn Leu Pro Gly Val Lys Gly Ile Gly Glu Lys Thr 210 215 220 Ala Leu Lys Leu Leu Lys Glu Trp Gly Ser Leu Glu Asn Leu Leu Lys 225 230 235 240 Leu Arg Leu Ser Leu Glu Leu Ser Arg Val Arg Thr Asp Leu Pro Leu 245 250 255 Glu Val Asp Leu Ala Gln Gly Arg Glu Pro Asp Arg Glu Gly Leu Arg 260 265 270 Ala Phe Leu Glu Arg Leu Glu Phe Gly Ser Leu Leu His Glu Phe Gly 275 280 285 Leu Leu Glu Ala Pro Ala Pro Leu Glu Glu Ala Pro Trp Pro Pro Pro 290 295 300 Glu Gly Ala Phe Val Gly Phe Val Leu Ser Arg Pro Glu Pro Met Trp 305 310 315 320 Ala Glu Leu Lys Ala Leu Ala Ala Cys Arg Asp Gly Arg Val His Arg 325 330 335 Ala Ala Asp Pro Leu Ala Gly Leu Lys Asp Leu Lys Glu Val Arg Gly 340 345 350 Leu Leu Ala Lys Asp Leu Ala Val Leu Ala Ser Arg Glu Gly Leu Asp 355 360 365 Leu Val Pro Gly Asp Asp Pro Met Leu Leu Ala Tyr Leu Leu Asp Pro 370 375 380 Ser Asn Thr Thr Pro Glu Gly Val Ala Arg Arg Tyr Gly Gly Glu Trp 385 390 395 400 Thr Glu Asp Ala Ala His Arg Ala Leu Leu Ser Glu Arg Leu His Arg 405 410 415 Asn Leu Leu Lys Arg Leu Glu Gly Glu Glu Lys Leu Leu Trp Leu Tyr 420 425 430 His Glu Val Glu Lys Pro Leu Ser Arg Val Leu Ala His Met Glu Ala 435 440 445 Thr Gly Val Arg Leu Asp Val Ala Tyr Leu Gln Ala Leu Ser Leu Glu 450 455 460 Leu Ala Glu Glu Ile Arg Arg Leu Glu Glu Glu Val Phe Arg Leu Ala 465 470 475 480 Gly His Pro Phe Asn Leu Asn Ser Arg Asp Gln Leu Glu Arg Val Leu 485 490 495 Phe Asp Glu Leu Arg Leu Pro Ala Leu Gly Lys Thr Gln Lys Thr Gly 500 505 510 Lys Arg Ser Thr Ser Ala Ala Val Leu Glu Ala Leu Arg Glu Ala His 515 520 525 Pro Ile Val Glu Lys Ile Leu Gln His Arg Glu Leu Thr Lys Leu Lys 530 535 540 Asn Thr Tyr Val Asp Pro Leu Pro Ser Leu Val His Pro Arg Thr Gly 545 550 555 560 Arg Leu His Thr Arg Phe Asn Gln Thr Ala Thr Ala Thr Gly Arg Leu 565 570 575 Ser Ser Ser Asp Pro Asn Leu Gln Asn Ile Pro Val Arg Thr Pro Leu 580 585 590 Gly Gln Arg Ile Arg Arg Ala Phe Val Ala Glu Ala Gly Trp Ala Leu 595 600 605 Val Ala Leu Asp Tyr Ser Gln Ile Glu Leu Arg Val Leu Ala His Leu 610 615 620 Ser Gly Asp Glu Asn Leu Ile Arg Val Phe Gln Glu Gly Lys Asp Ile 625 630 635 640 His Thr Gln Thr Ala Ser Trp Met Phe Gly Val Pro Pro Glu Ala Val 645 650 655 Asp Pro Leu Met Arg Arg Ala Ala Lys Thr Val Asn Tyr Gly Val Leu 660 665 670 Tyr Gly Met Ser Ala His Arg Leu Ser Gln Glu Leu Ala Ile Pro Tyr 675 680 685 Glu Glu Ala Val Ala Phe Ile Glu Arg Tyr Phe Gln Ser Phe Pro Lys 690 695 700 Val Arg Ala Trp Ile Glu Lys Thr Leu Glu Glu Gly Arg Lys Arg Gly 705 710 715 720 Tyr Val Glu Thr Leu Phe Gly Arg Arg Arg Tyr Val Pro Asp Leu Asn 725 730 735 Ala Arg Val Lys Ser Val Arg Glu Ala Ala Glu Arg Met Ala Phe Asn 740 745 750 Met Pro Val Gln Gly Thr Ala Ala Asp Leu Met Lys Leu Ala Met Val 755 760 765 Lys Leu Phe Pro Arg Leu Arg Glu Met Gly Ala Arg Met Leu Leu Gln 770 775 780 Val His Asp Glu Leu Leu Leu Glu Ala Pro Gln Ala Arg Ala Glu Glu 785 790 795 800 Val Ala Ala Leu Ala Asn Glu Ala Met Glu Lys Ala Tyr Pro Leu Ala 805 810 815 Val Pro Leu Glu Val Glu Val Gly Met Gly Glu Asp Trp Leu Ser Ala 820 825 830 Lys Gly 3 591 DNA Thermus sp. modified_base (4) a, t, c or g 3 attngacggc cagtggggat cttgcatgcn tgcagntnng ggnnnngggc ccnnnnntnc 60 ccnggtacct gagccgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta 120 tccgctcaca attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc 180 ctaatgagtg agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg 240 aaacctgtcg tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg 300 tattgggcgc cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct 360 tcaccgcctg gccctgagag agttgcagca agcggtccac gctggtttgc cccagcaggc 420 gaaaatcctg tttgatggtg gttccgaaat cggcaaaatc ccttataaat caaaagaata 480 gcccgagatg ggttgagtgt tgttccagtt tggaacaaga gtccactatt aaagaacgtg 540 gactccaacg tcaaagggcg aaaaaccgtc tatcagggcg ataggcccac t 591 4 605 DNA Thermus sp. modified_base (1) a, t, c or g 4 ngacggccag tgccaagctt gcatgcctgc aggtcgactc tagaggatcc ccgggtaccg 60 agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 120 attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 180 agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 240 tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 300 cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct tcaccgcctg 360 gccctgagag agttgcagca agcggtccac gctggtttgc cccagcaggc gaaaatcctg 420 tttgatggtg gttccgaaat cggcaaaatc ccttataaat caaaagaata gcccgagata 480 gggttgagtg ttgttccagt ttggaacaag antccactat taaagaacgt ggactccaac 540 gtcaaagggc gaaaaaccgt ctatcagggc gatggcccac tacgtgaacc atcacccaaa 600 tcaas 605 5 595 DNA Thermus sp. modified_base (509) a, t, c or g 5 cgacggcagt gccaaccttg catgcctgca ggtcgactct agaggacccc gggtaccgag 60 ctcgaattcg taatcatggt catagctgtt tcctgtgtga aattgttatc cgctcacaat 120 tccacacaac atacgagccg gaagcataaa gtgtaaagcc tggggtgcct aatgagtgag 180 ctaactcaca ttaattgcgt tcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc 240 cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgccag 300 ggtgcttttt cttttcacca gtgagacggg caacagctga ttgcccttca ccgcctggcc 360 ctgagagagt tgcagcaagc ggtccacgct ggtttgcccc agcaggcgaa aatcctgttt 420 gatggtggtt ccgaaatcgg caaaatccct tataaatcaa aagaatagcc cgagataggg 480 ttgagtgttg ttccagtttg gaacaagant ccactattaa agaacgtgga ctccaacgtc 540 aaagggcgaa aaaccgtcta tcagggcgat ggcccactac gtgaaccatc accca 595 6 599 DNA Thermus sp. modified_base (6) a, t, c or g 6 cggcantgcc aaccttgcat gcctgcaggt cgactctaga aggaccccgg gtaccgagct 60 cgaattcgta atcatggtca tagctgtttc ctgtgtgaaa ttgttatccg ctcacaatcc 120 acacaacata cgagccggaa gcataaagtg taaagcctgg ggtgcctaat gagtgagcta 180 actcacatta attgcgttgc gctcactgcc cgctttccag tcgggaaacc tgtcgtgcca 240 gctgcattaa tgaatcggcc aacgcgcggg gagaggcggt ttgcgtattg ggcgccaggg 300 tggtttttct tttcaccagt gagacgggca acagctgatt gcccttcacc gcctggccct 360 gagagagttg cagcaagcgg tccacgctgg tttgccccag caggcgaaaa tcctgtttga 420 tggtggttcc gaaatcngca aaatccctta taaatcaaaa gaatagcccg agatagggtt 480 gagtgttgtt ccantttgga acaagatcca ctattaaaga acgtggactc cnacntccaa 540 aggcgaaaaa ccntctatca ngggcaaagg ccnctncntt aacnncnccn natcnnntt 599 7 585 DNA Thermus sp. modified_base (2) a, t, c or g 7 cngcagtgcg nccttgcatg cctgcaggtc gactctagag gaccccgggt accgagctcg 60 aattcgtaat catggtcata gctgtttcct gtgtgaaatt gttatccgct cacaattcca 120 cacaacatac gagccggaag cataaagtgt aaagcctggg gtgcctaatg agtgagctaa 180 ctcacattaa ttgcgttgcg ctcactgccc gctttccagt cgggaaacct gtcgtgccag 240 ctgcattaat gaatcggcca acgcgcgggg agaggcggtt tgcgtattgg gcgccagggt 300 ggtttttctt ttcaccagtg agacgggcaa cagctgattg cccttcaccg cctggccctg 360 agagagttgc ancaagcggt ccacnctggt ttgccccanc angcgaaaat cctgtntgat 420 ngtggtccna aatcngcnaa atcccntntn nntcnnaana atnncccnan atnnggttga 480 gttnntncnn cnggannnna ntncncnnnn nnnnannntn nacncnnncn tnnnnnggnn 540 annnnnnnnt nnnnngnnnn nnnnnnnnnn nnnnnnnnnn nnnnn 585 8 604 DNA Thermus sp. modified_base (1) a, t, c or g 8 ngacgggcag tgccaagctt gcatgcctgc aggtcgactc tagaggatcc ccggggtacc 60 gagctcgaat tcgtaatcat ggtcatagct gtttcctgtg tgaaattgtt atccgctcac 120 aattccacac aacatacgag ccggaagcat aaagtgtaaa gcctggggtg cctaatgagt 180 gagctaactc acattaattg cgttgcgctc actgcccgct ttccagtcgg gaaacctgtc 240 gtgccagctg cattaatgaa tcggcnaacg cgcggggaga ggcggtttgt gtntttggnt 300 ncanggtggc cntttttttt tttttttttt tntcnnnntc ncnnnnctnn antttttntt 360 ctnttntttn tnnnntttnt ttttnttttt ntnnntatcn ctnccnnntt tttttttttt 420 tntttccncc tncntnnntn tnattttntt ttttntantt tttcctttnt ttttttttnt 480 tttntanttt ntnncccctc ccccctttcc cccccccccc cccccccncc ccccnnntnt 540 tttttttctt nnttttccat cccctccncc ccccccttcn tnnnctntnt tttntttttt 600 tnnt 604 9 634 DNA Thermus sp. modified_base (3) a, t, c or g 9 atngaacggg cagtgccaag cttgcatgcc tgcaggtcga actctagagg atccccgggg 60 taccgagctc gaattcgtaa tcatggtcat agctgtttcc tgtgtgaaat tgtnatccgc 120 tcacaattcc acacaacatn nagccggaag cataangtgt taagcctggg gtgcctattg 180 antnancaat ctcncatttt tttatctctc tctcacnttt cttttntttc cngcacatna 240 cccctcctcn atttntattc ntttccttaa ncanncnncc tccatcctta ntccctcctt 300 nttttccttc nttcccctcc nncnccctnt tttttttttt ttcanccccn ntcnccttcc 360 ttnctccttc ttntcttttc tntncccttc ctattntttc tnctnncttt ctcntanccc 420 ctcccctaat ntcttttnct tcttttctct cncccctttt nccncctntc tctcttttct 480 tcttcccctc ncattatttt ttcttcnctn ccattctctt ctctcnttcc ncntattatn 540 ctcnttcctc tatcctttcc cccnctcatt nccncccatc ctnatttatc ttcncttttt 600 cccntttnnc ttatncnttt ccctctctnc atcc 634 10 597 DNA Thermus sp. modified_base (12) a, t, c or g 10 gacggcatgc cntgcttgca tgtcnactcn tcaggatccc cgggtaccga gctcgaattc 60 gtaatcatgg tcatagctgt ttcctgtgtg aaattgttat ccgctcacaa ttccacacaa 120 catacgagcc ggaagcataa agtgtaaagc ctggggtgcc taatgagtga gctaactcac 180 attaattgcg ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgt gccagctgca 240 ttaatgaatc ggccaacgcg cggggagagg cggtttgcgt attgggcgcc agggtggttt 300 ttcttttcac cagtgagacg ggcaacagct gattgccctt caccgcctgg ccctgagaga 360 gttgcagcaa gcggtccacg ctggtttgcc ccagcaggcg aaaatcctgt ttgatggtgg 420 ttccgaaatc ggcaaaatcc cttataaatc aaaagaatac cgagatangg ttgantgttg 480 ttccagtttg gaacaagant ccactattaa agaacgtgga ctccaacgtc aaagggcgaa 540 aaaccgtcta tcagggcgan ggcccactac gtgaaccatc accaaatcaa tttttts 597 11 598 DNA Thermus sp. modified_base (1) a, t, c or g 11 ngacggccag tgccnagctt gcatgcctgc aggtcgactc tagaggatcc ccgggtaccg 60 agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 120 attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 180 agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 240 tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 300 cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct tcaccgcctg 360 gccctgagag agttgcagca agcggtccac gctggtttgc cccagcaggc gaaaatcctg 420 tttgatggtg gttccgaaat cggcaaaatc ccttataaat caaaagaata gcccgagata 480 gggttgagtg ttgttccagt ttggaacaag antccactat taaagaacgt ggactccaac 540 gtcaaagggc gaaaaacgtc tatcagggcg atggcccact acgtgaacca tcacccaa 598 12 605 DNA Thermus sp. modified_base (2)..(12) a, t, c or g 12 tnnnnnnnnn nnattgacgg caatgcnact tgcatgcctg caggtcgact ctagaggatc 60 cccgggtacc gagctcgaat tcgtaatcat ggtcatagct gtttcctgtg tgaaattgtt 120 atccgctcac aattccacac aacatacgag ccggaagcat aaagtgtaaa gcctggggtg 180 cctaatgagt gagctaactc acattaattg cgttgcgctc actgcccgct ttcgagtcgg 240 gaaacctgtc gtgccagctg cattaatgaa tcggccaacg cgcggggaga ggcggtttgc 300 gtattgggcg ccagggtggt ttttcttttc accagtgaga cgggcaacag ctgattgccc 360 ttcaccgcct ggccctgaga gagttgcagc aagcggtcca cgctggtttg ccccagcagg 420 cgaaaatcct gtttgatggt ggttccgaaa tcggcaaaat cccttataaa tcaaaagaat 480 agcccgagat agggttgagt gttgttccag tttggaacaa gantccacta ttaaagaacg 540 tggactccaa cgtcaaaggg cgaaaaaccg tctatcaggg cgatggccca ctacgtgaac 600 catca 605 13 596 DNA Thermus sp. modified_base (14) a, t, c or g 13 gacggccagt gccnagcttg catgcctgca ggtcgactct agaggacccc gggtaccgag 60 ctcgaattcg taatcatggt catagctgtt tcctgtgtga aattgttatc cgctcacaat 120 tccacacaac atacgagccg gaagcataaa gtgtaaagcc tggggtgcct aatgagtgag 180 ctaactcaca ttaattgcgt tcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc 240 cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgccag 300 ggtggttttt cttttcacca gtgagacggg caacagctga ttgcccttca ccgcctggcc 360 ctgagagagt tgcagcaagc ggtccacgct ggtttgcccc agcaggcgaa aatcctgttt 420 gatggtggtt ccgaaatcgg caaaatccct tataaatcaa aagaatagcc gagatagggt 480 tgagtgttgt tccagtttgg aacaagantc cactattaaa gaacgtggac tccaacgtca 540 aagggcgaaa aaccgtctat cagggcgatg gcccactacg tgaaccatca cccaaa 596 14 602 DNA Thermus sp. modified_base (2)..(3) a, t, c or g 14 tnntnnnnnn atttgacggc agtgcnncct tgcatgcctg caggtcgact ctagaggacc 60 ccgggtaccg agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta 120 tccgctcaca attccacaca acatacgaag ccggaagcat aaagtgtaaa gcctggggtg 180 cctaatgagt gagctaactc acattaattg cgttgcgctc actgcccgct ttccagtcgg 240 gaaacctgtc gtgccagctg cattaatgaa tcggccaacg cgcggggaga ggcggtttgc 300 gtattgggcg ccagggtggt ttttcttttc accagtgaga cgggcaacag ctgattgccc 360 ttcaccgcct ggccctgaga nagttgcagc aagccgtcca cgctggtttg ccccagcagg 420 cgaaaatcct gtttgatggt ggttccgaaa atcgcaaaat cccttataat caaaaaaata 480 cccgaaatag ggttaatgtt gttccatttt ggaacaaaat ccatattaaa aaagtggact 540 ccacgtcaaa gggcnaaaaa ccgctatcag ggcnangggc cnctacttta accatcccca 600 aa 602 15 602 DNA Thermus sp. modified_base (1) a, t, c or g 15 ngacggccag tgccaagctt gcatgcctgc aggtcgactc tagaggatcc ccgggtaccg 60 agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 120 attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 180 agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 240 tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 300 cagggtggtt tttcttttca ccantgagac gggcaacagc tgattgccct tcaccgcctg 360 gccctganag agttgcancn ancggtccan ncnngttngc cncnncnngc naannnccnn 420 tnnnanngtn gnncnnannn nnnnnnnnnn nnnnnnnnnn nnnnnannnn nnnanannng 480 gtnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 540 nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn nnnnnnnnnn 600 nn 602 16 597 DNA Thermus sp. modified_base (3)..(5) a, t, c or g 16 ttnnnacngc cagtgccaag cttgcatgcc tgcaggtcga ctctagagga tccccgggta 60 ccgagctcga attcgtaatc atggtcatag ctgtttcctg tgtgaaattg ttatccgctc 120 acaattccac acaacatacg agccggaagc ataaagtgta aagcctgggg tgcctaatga 180 gtgagctaac tcacattaat tgcgttgcgc tcactgcccg ctttccagtc gggaaacctg 240 tcgtgccagc tgcattaatg aatcggccaa cgcgcgggga gaggcggttt gcgtattggg 300 cgccagggtg gtttttcttt tcaccagtga gacgggcaac agctgattgc ccttcaccgc 360 ctggccctga gagagttgca gcaagcggtc cacgctggtt tgccccaaca ngcgaaaatc 420 ctgtttgatg gtggttccga aatcngcnaa atcccttatn aatcnnaana atacccgaga 480 tanggttgag tgtnntccan tnnggancnn natccncnan nnnnnacntn nanccnnnnt 540 cnaanggcna anancnngcn nnnnnggcna ngnnnnnnnn tnnnnnnnnn nnnnnnn 597 17 605 DNA Thermus sp. modified_base (16) a, t, c or g 17 cgacggccag taccgncttg catgcctgca ggtcgactct agaggatccc cgggtaccga 60 gctcgaattc gtaatcatgg tcatagctgt ttcctgtgtg aaattgttat ccgctcacaa 120 ttccacacaa catacgagcc ggaagcataa agtgtaaagc ctggggtgcc taatgagtga 180 gctaactcac attaattgcg ttgcgctcac tgcccgcttt ccagtcggga aacctgtcgt 240 gccagctgca ttaatgaatc ggccaacgcg cggggagagg cggtttgcgt attgggcgcc 300 agggtggttt ttcttttcac cagtgagacg ggcaacagct gattgccctt caccgcctgg 360 ccctgagaga gttgcagcaa gcggtccacg ctggtttgcc ccagcaggcg aaaatcctgt 420 ttgatggtgg ttccgaaatc ggcaaaatcc cttataaatc aaaagaatag cccgagatag 480 ggttgagtgt tgttccagtt tggaacaaga ntccactatt aaagaacgtg gactccaacg 540 tcaaagggcg aaaaaccgtc tatcaggggc gaaggccact acntgaacca tcacccaaat 600 caagt 605 18 601 DNA Thermus sp. modified_base (1) a, t, c or g 18 nacggncatt gccnancttg catgccttgc aggtcgactc tagaggatcc ccgggtaccg 60 agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 120 attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 180 agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 240 tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 300 cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct tcaccgcctg 360 gccctgagag agttgcagca agcggtccac gctggtttgc cccancaggc gaaaatcctg 420 tttgatggtg gttccgaaat cggcaaaatc ccttataaat caaaagaata gcccgagata 480 gggttgagtg ttgttccagt ttggaacaag antccactat taaagaacgt ggactccaac 540 gtcaaagggc gaaaaaccgt ctatcagggc gatgcccact acgtgaacca tcacccaaat 600 c 601 19 601 DNA Thermus sp. modified_base (2) a, t, c or g 19 cngtcatacc gagcttgcat gcctgcaggt cgactctaga ggatccccgg gtaccgagct 60 cgaattcgta atcatggtca tagctgtttc ctgtgtgaaa ttgttatccg ctcacaattc 120 cacacaacat acgagccgga agcataaagt gtaaagcctg gggtgcctaa tgagtgagct 180 aactcacatt aattgcgttg cgctcactgc ccgctttcca gtcgggaaac ctgtcgtgcc 240 agctgcatta atgaatcggc caacgcgcgg ggagaggcgg tttgcgtatt gggcgccagg 300 gtggtttttc ttttcaccag tgagacgggc aacagctgat tgcccttcac cgcctggccc 360 tgagagagtt gcagcaagcg gtccacgctg gtttgcccca gcaggcgaaa atcctgtttg 420 atggtggttc cgaaatcggc aaaatccctt ataaatcaaa agaatagccc gagatagggt 480 tgagtgttgt tccagtttgg aacaagagtc cactattaaa gaacgtggac tccaacgtca 540 aagggcgaaa aaccgtctat cagggcgatg gcccactacn tgaaccatca cccaaatcaa 600 g 601 20 619 DNA Thermus sp. modified_base (1) a, t, c or g 20 nangacggca gtgccaagct tgcatgcctg caggtcgact ctagaggatc cccgggtacc 60 gagctcgaat tcgtaatcat ggtcatagct gtttcctgtg tgaaattgtt atccgctcac 120 aattccacac aacatacgag ccggaagcat aaagtgtaaa gcctggggtg cctaatgagt 180 gagctaactc acattaattg cgttgcgctc actgcccgct ttccagtcgg gaaacctgtc 240 gtgccagctg cattaatgaa tcggccaacg cgcggggaga ggcggtttgc gtattgggcg 300 ccagggtggt ttttcttttc accagtgaga cgggcaacag ctgattgccc ttcaccgcct 360 ggccctgaga ganttgcagc aagcggtcca cgctggtttg ccccagcagg cgaaaatcct 420 gtttgatggt ggttccgaaa tcggcaaaat cccttataaa tcaaaaagaa tagcccgaga 480 tagggttgag tgttgttccc antttgggaa caanaatccc acttattaaa gaaactggan 540 tcccaacgtc aaagggcgaa aaaaaccgtc tancaggggc gaanggcccn ctncntgaac 600 cnnccncccc aaatcaaat 619 21 605 DNA Thermus sp. modified_base (1) a, t, c or g 21 nttangacgg gccagtgnca atcttgcatg cctgcaggtc gactctagag gatccccggg 60 ttaccgagct cgaattcgta atcatggtca tagctgtttc ctgtgtgaaa ttgttatccg 120 ctcacaattc cacacaacat acgagccgga agcataaagt gtaaagcctg gggtgcctaa 180 tgagtgagct aactcacatt aattgcgttg cgctcactgc ccgctttcca gtcgggaaac 240 ctgtcgtgcc agctgcatta atgaatcggc caacgcgcgg ggagaggcgg tttgcgtatt 300 gggcgccagg gtggtttttc ttttcaccag tgagacgggc aacagctgat tgcccttcac 360 cgcctggccc tgagagagtt gcagcaagcg gtccacgctg gtttgcccca gcaggcgaaa 420 atcctgtttg atggtggttc cgaaatcggc aaaatccctt ataaatcaaa agaatagccc 480 gagatagggt tgagtgttgt tccagtttgg aacaagatcc actattaaag aacgtggact 540 ccaacgtcaa agggcgaaaa acgtctatca gggcganggc ccactacgtg aaccatcacc 600 caaat 605 22 602 DNA Thermus sp. modified_base (2) a, t, c or g 22 anttcattgc caagcttgca tgcctgcagg tcgactctag aggatccccg ggtaccgagc 60 tcgaattcgt aatcatggtc atagctgttt cctgtgtgaa attgttatcc gctcacaatt 120 ccacacaaca tacgagccgg aagcataaag tgtaaagcct ggggtgccta atgagtgagc 180 taactcacat taattgcgtt gcgctcactg cccgctttcc agtcgggaaa cctgtcgtgc 240 cagctgcatt aatgaatcgg ccaacgcgcg gggagaggcg gtttgcgtat tgggcgccag 300 ggtggttttt cttttcacca gtgagacggg caacagctga ttgcccttca ccgcctggcc 360 ctgagagagt tgcagcaagc ggtccacgct ggtttgcccc agcaggcgaa aatcctgttt 420 gatggtggtt ccgaaatcgg caaaatccct tataaatcaa aagaatagcc cgagataggg 480 ttgagtgttg ttccagtttg gaacaagatc cactattaaa gaacgtggac tccaacgtca 540 aagggcgaaa aaccgtctat cagggcgang gcccactacg tgaancatca ccaaatcaag 600 tt 602 23 605 DNA Thermus sp. modified_base (1) a, t, c or g 23 ngacggccag tgccaagctt gcatgcctgc aggtcgactc tagaggatcc ccgggtaccg 60 agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 120 attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 180 agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 240 tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 300 cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct tcaccgcctg 360 gccctgagag agttgcagca agcggtccac gctggtttgc cccagcaggc gaaaatcctg 420 tttgatggtg gttccgaaat cggcaaaatc ccttataaat caaaagaata gcccgagata 480 gggttgagtg ttgttccagt ttggaacaag antccactat taaagaacgt ggactccaac 540 gtcaaagggc gaaaaaccgt ctatcagggc gatggcccac tacgtgaacc atcacccaaa 600 tcaas 605 24 604 DNA Thermus sp. modified_base (512) a, t, c or g 24 cgacggccag tgccaagctt gcatgcctgc aggtcgactc tagaggatcc ccgggtaccg 60 agctcgaatt cgtaatcatg gtcatagctg tttcctgtgt gaaattgtta tccgctcaca 120 attccacaca acatacgagc cggaagcata aagtgtaaag cctggggtgc ctaatgagtg 180 agctaactca cattaattgc gttgcgctca ctgcccgctt tccagtcggg aaacctgtcg 240 tgccagctgc attaatgaat cggccaacgc gcggggagag gcggtttgcg tattgggcgc 300 cagggtggtt tttcttttca ccagtgagac gggcaacagc tgattgccct tcaccgcctg 360 gccctgagag agttgcagca agcggtccac gctggtttgc cccagcaggc gaaaatcctg 420 tttgatggtg gttccgaaat cggcaaaatc ccttataaat caaaagaata gcccgagata 480 gggttgagtg ttgttccagt ttggaacaag antccactat taaagaacgt ggactccaac 540 gtcaaagggc gaaaaacgtc tatcagggcg atggcccact acgtgaacca tcacccaaat 600 caag 604 25 45 DNA Artificial Sequence Description of Artificial Sequence Primer 25 ctgttcgaac ccaaaggccg tgtcctcctg gtggccggcc accac 45 26 24 DNA Artificial Sequence Description of Artificial Sequence Primer 26 gaggctgccg aattccagcc tctc 24 27 23 DNA Artificial Sequence Description of Artificial Sequence Primer 27 gttttcccag tcacgacgtt gta 23 

What is claimed is:
 1. A purified recombinant thermostable DNA polymerase comprising the amino acid sequence set forth in FIG. 1 (SEQ ID No. 2).
 2. A kit for sequencing DNA comprising the DNA polymerase of claim
 1. 